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ABSTRACT 


'Updating  formulas  for  the  forecasts  of  a one-parameter  auto- 
regressive model  are  obtained  when  the  parameter  is  assumed 
random.  It  is  shown  that  the  updated  forecasts  are  similar 
to  those  derived  from  exponent tally  weighted  moving  average 
forecasts  with  the  important  difference  that  forecasts  can 
lie  outside  the  interval  containing  the  old  forecast  and  the 
new  observation.  Based  on  the  /'growttt^of  the  new  observa- 
tions the  updated  confidence  intervals  may  become  larger  or 
smaller  than  the  old  ones.  Similarities  to  and  differences 
between  a Box-Jenkins  model,  a Kalman  Filter  and  a model 
proposed  by  Makridakis  and  Wheelwright  are  illustrated. 


1.  INTRODUCTION 


In  several  Industrial  and  government  problems  to  which  I have  been 
exposed,  practitioners  suggested  that  some  sort  of  exponentially  weighted 
moving  average  (EWMA)  forecast  should  be  used  even  though  it  was  recognized 
that  the  underlying  structure  of  the  process  was  not  described  by  an  ARIMA 
(0,1,1)  model  (see  Box  and  Jenkins  [1976],  Brown  [1961]). 

As  is  well  known  the  EWMA  models  have  the  appealing  and  practical 
feature  that  a forecast  i periods  into  the  future  from  an  origin  at  t 
can  be  expressed  as  a weighted  moving  average  of  the  historical  sequence 
of  past  observations.  Thus  the  forecast  at  t can  be  expressed  as  a 
linear  interpolation  of  the  previous  forecast  and  the  new  observation 

i'W  - E[zc+t  | •••>  • E[*t+»  I "J2’] 


- £t(!)  - (l  - 0)zt  l(i)  + e*t 


i > l , 1 0 1 < l . 


The  confidence  intervals  in  these  forecasts  can  also  be  estimated 


(lb)  vt(t)  - Var  [.t+l  I H<‘>] 


(1  + (£  - 1) (1  - e )2)ol 


where  o&  is  the  variance  of  the  stationary  noise  distribution. 

Some  of  the  features  of  (1)  which  appeal  to  practitioners  are  the 
minimal  data  storage  requirements,  the  fact  that  new  data  is  Introduced 
into  the  updated  forecasts  by  weighting  the  most  recent  observation, 
that  distant  historical  observations  are  weighted  less  heavily  than 
recent  data  and  that  the  forecasts  are  based  on  a first-order  moving 
average  equation  of  motion. 
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If  one  uses  Box  and  Jenkins  methods  for  forecasting  autoregressive 
and  moving  average  processes  (as  in  the  EWMA  model),  parameters  are 
estimated  by  any  one  of  several  procedures  and  then  assumed  to  be  fixed 
throughout  an  interval  of  time  in  which  forecasts  are  used.  One  is  led 
to  ask  the  natural  question:  Should  the  parameters,  such  as  6 in  (1), 
be  updated  and  if  so,  how  frequently?  In  the  model  presented  in  this 
paper  we  assume  that  the  parameter  itself  is  a random  variable  and  is 
reestimated  as  frequently  as  forecasts  are  revised.  The  results  obtained 
from  a simple  adaptive  model  in  which  one  requires  that  parameters  be 
updated  hand  in  hand  with  forecasts  gives  numerous  insights  into  how  one 
might  deal  with  changes  or  discontinuities  in  the  data. 

In  the  developments  that  follow  we  have  occasion  to  apply  Bayes 
Theorem  to  a formula  for  updating  a Gaussian  distribution  based  on  the 
observation  of  the  random  variable  x . If  we  assume  that 


it  follows  from  Bayes  Theorem  (De  Groot  [1976])  that  the  conditional 
distribution  of  m given  the  observation  x is 


m | x 


with  updated  mean  and  variance  given  by 


3 


(2b) 


,.2 


m 


2 2 
a o 
x m 

2 , 2 
o +o 
x m 


Equation  (2a)  has  the  feature  that  the  new  expectation 
interpolation  of  the  old  expectation  and  the  new  data. 


li  is  a linear 

IQ 

If  o2  is 
m 


large  the  new  data  x , is  weighted  heavily,  otherwise  u'  is  close 

m 


to  y 


In 


2 2 

(2b)  o'  < a due  to  the  fact  that  a new  piece  of  data 


in  in  ra 

has  been  obtained  and  one's  new  estimate  of  m is  more  precise. 
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2.  THE  FORECASTING  MODEL 

To  deal  with  a situation  where  the  parameter  is  not  known  with 
certainty,  we  make  the  following  assumptions  regards  the  underlying 
equation  of  motion  for  the  time  series  {zt}  » 


(Al) 


(A2) 


(A3) 


Zt+1  " *t+lZt  + at+l 


t - 0,1,2,  ...  , 


at  - »(0>°a)  lld 

♦t*i  ' *t+i  I <Z>  ' 4%W(“*)t+i)  • 


(Al)  is  the  equation  of  motion  for  an  autoregressive  model  of  order  1. 
at+^  is  a random  shock  at  time  t and  distributed  according  to  (A2) . 

is  the  autoregressive  parameter  whose  distribution  prior  to  observing 
zt+l  given  by  (A3),  where  the  moments  (y^)t+1  and  (°^)t+l  dePend 
in  a known  way  on  the  history  of  the  z^  process:  . In  both  (A2) 

and  (A3)  the  random  variables  are  Gaussian. 

The  timing  of  events  is  as  follows:  based  on  a prior  estimate  of 
the  mean  of  $ ^ we  make  the  one  period  forecasts 


(3a) 


and 


(3b) 


£t(1)  " E[zt+1 


zt]  “ Vt 


vt(l)  - Var  [*t+1 


zt]  " zt%  + aa  * 


where,  for  simplicity,  the  subscript  on  y.  and  oz  is  temporarily 

♦ ? 

deleted.  The  observation  zt+^  occurs  and  is  recorded.  Based  on  this 
observation  we  obtain  a new  history  H^z^  and  then  use  Bayes  Theorem  to 
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reestimate  Che  $ distribution  with  updated  parameters  (y')  ,,  and 

9 t+1 

. Note  that  this  problem  is  not  as  trivial  as  it  may  seem  for 
the  simple  reason  that  the  realized  value  of  zt+^  is  affected  by  two 
random  variables  at+^  and  ^t+1  * a-*-8or^t^m  which  computes 

(yJ)t+i  an<*  (°9^)t+l  must  decide  how  to  allocate  the  forecast  residual 
zt+^  - zt(l)  to  random  occurrences  in  both  at+^  and  ^t+1  ' We  n°W 
make  assumption 

(A4)  s[»t+2  I ■$]  - E[»t+1  I h£>]  - <wt)t+2  - (U;)t+1  - u;  • 


In  other  words  the  realization  of 


conditional  on  the  historical 


observations  2t+i,2t’  •••  has  a Pri°r  expectation  equal  to  its  most 
recent  posteriori  expectation.  We  are  then  in  a position  to  recompute 
the  next  period  forecast  as 

<4“>  S+l(1)  ■ "Ww-l1  ’ “JS+l  • 

In  the  special  case  where  9^  ■ $2  “9  » a fixed  but  unknown 

quantity,  $t+2  will  be  distributed  according  to  the  posterior  dis- 
tribution of  *t+1  , i.e.,  ^+2  - . Thus,  we  also 


Vl(1)  - + °a 


One  of  the  main  results  of  this  paper  is  to  show  how  to  proceed  from  the 

forecasts  in  (3)  to  the  revised  forecasts  in  (4)  without  need  to  explicitly 

compute  the  updated  parameters.  In  fact  y’  is  never  explicitly  required 

9 

even  though  it  is  carried  implicitly  throughout  the  analyses.  The 
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derivation  of  results  for  the  revised  forecast  is  straightforward.  It 
follows  from  (Al)  and  (A2)  that  conditionally  on  $ the  growth  in 

Zt  is 


(5a) 


6 1+1 


♦t«  - kK+i -\°l) 


-2  2 2 

With  d , , substituted  for  m , z o for  o , (p.)  for 
t+1  t a x * <f>  t+1 

V1  » for  G2  in  (1)  and  (2)  and  Assumption  (A3)  the  posterior 

m \ <p/t+l  m 

distribution  of  4>t+^  conditional  on  the  newly  observed  growth  is 


(5b) 


"t+1 


if 


2 

where  updated  p'  and  o'  (subscripts  deleted)  are  now  given  by 


(6a) 


"♦  ■ V*  + (1  - V z 


t+1 


zt  J*  0 


zt  * 0 


(6b) 


with 


(6c) 


,2  2 2 

rA  “ ata<j)  % 


2/2  2 2\-l 

- a I a + zo,  ) 
t a \ a t $/ 


To  obtain  the  revised  forecast  of  zt+2  at  or*-8ln  t + 1 in 

terms  of  the  old  at  t we  substitute  p!  for  p,  and  t+1  for  t 

V $ 

in  (4)  and  (6)  to  obtain 


f 


i 

} 

i 


1 
■ i 
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W1’  ■ “J't+i 


(7) 


<(Vt(1)>  + (1  - at)zt+1)  (jrfj  zt  * 0 


- zt(l) 


2t  - 0 


Again,  for  the  special  case  where  - $2 


<J>  the  updated 


variance  is 


(8) 


vt+l(1) 


2 , 2 
0 + Z„,,0 

a t+1 


,2 


What  insights  can  one  obtain  from  such  a model  and  the  updating 
formula  for  new  forecasts?  It  seems  to  me  there  are  several  worth 
discussing: 

( 2 2 2 \ 

Zt,aa,a<p ) are  data  dependent.  They 
also  depend  on  the  uncertainty  in  the  noise  and  the 

parameter  estimate  but  are  independent  of  p^  , 

the  expectation  of  the  parameter. 

(ii)  The  revised  value  for  the  expectation  of  the  autoregressive 
parameter,  ^ , is  a linear  interpolation  of  the  old 
expectacion  p^  and  the  most  recent  growth  measured  as  the 
ratio  of  two  successive  observations,  zt+i/zt  • The 
parameter  itself  is  revised  by  a rule  similar  to  one  used 
in  EWMA  models. 

(ill)  The  revised  value  for  the  new  forecast  is  a linear  inter- 
polation of  the  old  forecast  and  the  new  observation 
multiplied  by  the  growth  zt.|.]/zt  • Thus,  an  extremely 
large  or  extremely  small  value  of  zt+j/zt  can  in  one 


\ 

1 

! 

I 

! 

I 

j 


time  period  force  the  revised  forecast  to  lie  outside  the 


A 


interval  (2t+i»*t(l)>  or  (zt(D»zt+1)  until  the 
parameters  and  forecasts  readjust  in  the  periods  that 
follow. 

2 

(iv)  As  the  estimate  of  $ becomes  more  certain  and  o, 

9 

becomes  small,  at  approaches  unity  and  zt+1(l)  "*■ 

- ...  Zt+1 

ztU)  — “ Vt+1  • 

(v)  Forecasts  and  confidence  intervals  can  be  initiated  with 
little  or  no  knowledge  about  the  historical  behavior  of 
<f>t  . Furthermore,  if  one  suspects  from  external  con- 
siderations that  the  uncertainty  in  one's  knowledge  of 
has  increased  one  can  immediately  insert  this  new 
information  into  the  weight  at  and  the  formulas  for 


the  revised  forecasts. 
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1 


.1 


i 

! 


i 
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3.  COMPARISONS  WITH  OTHER  MODELS 

The  simplest  one-parameter  autoregressive  model  (AR(1))  has  an 
equation  of  motion  (Box  and  Jenkins  [1976]) 

(9)  zt+l  " *zt  + at+l  16 1 < 1 * 

In  other  words  (9)  is  based  on  Assumptions  (Al)  and  (A2)  with  <t» 
assumed  given  and  fixed.  It  is  well  known  that  future  forecasts  and 
confidence  intervals  are  given  by 


(10) 

(ID 


*tU)  - l L 1 


U)  - a 


2 1 - ♦ 
a 1 - ♦ 


21 

2~  ’ 


As  we  have  already  pointed  out,  when  known  with 

certainty  in  (Al)  - (A3)  we  have  * 0 , ot  - 1 , &t(*t+j/*t) 
Equations  (7)  and  (8)  reduce  to 


zt(1)  “ Vl(1)  z 


" <I>Z, 


t-1 


vt(l)  - °a 


in  agreement  with  (10)  and  (11) • 

A simple  version  of  a Kalman  filter  (Bryson  and  Ho  [1969])  with 
(9)  as  its  underlying  equation  of  motion  (system  equation)  assumes  that, 
in  addition  to  (10),  there  is  a noisy  measurement  or  obervation  equation 


of  the  form 


10 


yt+l  “ 2t+l  + bt+l 


bt  - »(o ,«J)  . 


zt+^  is  never  directly  observed  but  rather  Is  estimated  through  the 
direct  observation  of  yt+^  in  (12).  One  can  therefore  make  a prior 
(before  measurement  of  yt+^)  forecast  of  zt+^  and  a posterior  (after 
measurement)  estimate  of  the  expected  or  likely  value  of  zt+±  when  it 
occurs.  If  we  define  the  a priori  and  a posteriori  forecasts  by  z (1) 
and  £ (1)  and  variances  by  vt(l)  and  vt(l)  respectively,  we 
obtain 


(13a) 


zt(l)  - (jiz^U) 


(13b) 


vt(l)  **  ^v^d)  + a2  . 


Following  the  measurement  of  yt+^  1°  (12),  we  again  invoke  Bayes  Theorem 


to  obtain  the  result 


(14a) 


v (1) 

zt(D  ■ zt(D  + — 2 — (yt+l  ' 
°b 


(14b) 


vt(D 


vt(l) 

In  these  expressions,  — ^ — is  conventionally  referred  to  as  the 

°b 

"Kalman  Gain"  while  the  residual  yt+^  ~ *t(l)  is  the  "innovation." 

As  is  wall  known  this  Kalman  filter  provides  an  a posteriori  forecast 
of  "where  you  were"  which  is  a linear  interpolation  of  the  a priori 


forecast  ("where  you  thought  you  would  be")  and  the  observation  yt+^  • 
The  measurement  equation  reduces  the  uncertainty  but  the  equation  of 
motion  tends  to  Increase  it.  It  is  clear  that  there  are  certain 
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similarities  between  the  updating  formulas  of  (13)  and  (14)  and 

those  of  (7)  and  (8).  However,  the  differences  are  important  enough 

2 

to  mention:  In  this  Kalman  filter  model  the  parameter  o.  does  not 

appear  since  $ is  assumed  known.  The  weight  y in  (14)  does  not 

depend  on  past  observations  (as  do  at  and  8t)  but  rather  on  the 

most  recent  a priori  estimate  of  variances.  Furthermore  the  rule  for 

2 

revising  forecasts  depends  on  a,  which  is  assumed  known.  Neither 

D 

(10)  nor  (13) , (14)  explicitly  allow  for  growth  in  z to  affect  the 
estimate  of  41  . 

During  the  period  in  which  an  early  draft  of  this  paper  has  been 
read  by  several  associates  it  has  come  to  my  attention  that  the  AR(1) 
forecasting  model  can  also  be  viewed  as  a special  case  of  a Kalman  filter 
suggested  by  Harrison  and  Stephens  [1976]  in  which  the  underlying  equation 
of  motion  is  assumed  to  be  given  in  terms  of  the  parameter  (rather  than 


the  state  variable  zt)  as 


(15) 


Kt+1 


*t  + ct+i 


:t  ' B(0>Oc) 


and  the  "noisy"  measurement  equation  for  the  parameter  is  written  in 


terms  of  as 


't+1 


l’t+lzt  + at+l 


at  - N(0^) 
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\ . 


with  zt  given  by  the  previous  measurement.  Using  Assumptions  (Al)  - (A4) 
and  the  application  of  Bayes  Theorem  we  have 


rt+l 


't+1 


- s(v’*) 

1 »t+l  ~ "(Wf°a) 


Kt+1  1 “t+1 


with 


(16) 


and 


(17) 


“i  " ^ + ktet+lZt 


KiA 1 


.2,  2 
(,a  + z 


V* ♦ 0 


An  interesting  version  of  (6a)  is  the  revised  parameter  estimate 


a - v - %) 


(18) 


“ + (1  " at)et+lZt/zt 


+ 


(;2  2 2)  Wt 

v. + sv 


which  also  states  that  the  correction  term  one  adds  to  to  get 

is  (approximately)  proportional  to  the  product  of  the  most  recent 
forecast  residual  and  the  previous  observation  of  the  time  series. 
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If  we  Interpret  kt  in  (17)  or  z t (1  - <*t)  as  8 data-dependent  factor, 
then  (18)  is,  except  for  the  structure  of  this  factor,  similar  to  the 
proposals  of  Makridakis  and  Wheelwright  [1976].  They  suggest  that  fore- 
casts should  be  written  in  the  form 


Zt+1  " *Zt 


''  * ♦ + Ket+lzt 


and  k a "training  constant"  or  "learning  factor"  derived  from  experience 
with  the  data.  What  is  appealing  about  formulas  (17),  (18)  and  (19)  is 
that  corrections  are  proportional  to  the  product  of  forecast  error  and  the 
value  of  the  "state"  variable,  a concept  which  has  been  applied  success- 
fully in  many  important  mechanical  and  electrical  guidance  control  problems. 
What  is  present  in  (17),  (18)  but  missing  from  the  Makridakis-Wheelwright 

formulation  is  that  the  "training  constant"  should  be  data  and  noise  de- 

2 2 

pendent,  l.e.,  it  may  be  large  or  small  depending  on  values  of  o^  , 

2 

and  . Further  results  for  p order  autoregressive  processes  are  de- 
scribed in  Nau  and  Oliver  [1978]. 


I 
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